{"nbformat":4,"nbformat_minor":0,"metadata":{"anaconda-cloud":{},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"Tutorial IV.ipynb","provenance":[],"collapsed_sections":["l7s9QvYetmcY","BGrGmp-5tmcV","uwVORI1Vtmca","zljWXlXquHgp","UPrth-Ertmck","N_DKz0sKtmco","FPzKLLPitmcr"],"toc_visible":true}},"cells":[{"cell_type":"markdown","metadata":{"id":"1oZByfpftmcT","colab_type":"text"},"source":["# Tutorial IV: Convolutions"]},{"cell_type":"markdown","metadata":{"id":"N__G0gDAtmcU","colab_type":"text"},"source":["

\n","Bern Winter School on Machine Learning, 27-31 January 2020
\n","Prepared by Mykhailo Vladymyrov.\n","

\n","\n","This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License."]},{"cell_type":"markdown","metadata":{"id":"knW3JBEDtmcU","colab_type":"text"},"source":["In this session we will look at the convolutoin operation and try to build some intuition about it.\n","Also we will look at one of the state-of-the art deep models, [Inception](https://arxiv.org/abs/1602.07261). It is designed to perform image recognition."]},{"cell_type":"markdown","metadata":{"id":"l7s9QvYetmcY","colab_type":"text"},"source":["## 1. Load necessary libraries"]},{"cell_type":"code","metadata":{"id":"3G6x1ENecsyd","colab_type":"code","colab":{}},"source":["# if using google colab\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"cjsvvAJatmcY","colab_type":"code","colab":{}},"source":["import sys\n","import os\n","\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import IPython.display as ipyd\n","import tensorflow.compat.v1 as tf\n","tf.disable_v2_behavior()\n","from PIL import Image\n","\n","# We'll tell matplotlib to inline any drawn figures like so:\n","%matplotlib inline\n","plt.style.use('ggplot')\n","\n","\n","from IPython.core.display import HTML\n","HTML(\"\"\"\"\"\")"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BGrGmp-5tmcV","colab_type":"text"},"source":["### Download libraries"]},{"cell_type":"code","metadata":{"id":"nL1BzlxC5PWy","colab_type":"code","colab":{}},"source":["p = tf.keras.utils.get_file('./material.tgz', 'https://scits-training.unibe.ch/data/tut_files/t4.tgz')\n","!mv {p} .\n","!tar -xvzf material.tgz > /dev/null 2>&1"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"vOpPZ2Hf5aB8","colab_type":"code","colab":{}},"source":["from utils import gr_disp\n","from utils import inception"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uwVORI1Vtmca","colab_type":"text"},"source":["## 2. Convolutions"]},{"cell_type":"markdown","metadata":{"id":"o0i6LmfYtmcb","colab_type":"text"},"source":["In fully connected network all inputs of a layer are connected to all neurons of the following layer:\n","\n"," \"drawing\" \n"," \"drawing\" \n"," \n","
In convolutional nets the same holds for each neighbourhood, and the weights are shared:
\n","\"drawing\"
\n","\"drawing\"
\n","\"drawing\"
\n"]},{"cell_type":"markdown","metadata":{"id":"D_nODHgbtmcb","colab_type":"text"},"source":["Let's see what a convolution is, and how it behaves."]},{"cell_type":"code","metadata":{"id":"dH2IPjiftmcc","colab_type":"code","colab":{}},"source":["#load image, convert to gray-scale and normalize\n","img_raw = plt.imread('ML3/chelsea.jpg').mean(axis=2)[-256:, 100:356].astype(np.float32)\n","img_raw = (img_raw-img_raw.mean())/img_raw.std()\n","\n","plt.imshow(img_raw, cmap='gray')\n","plt.grid(False)\n","img_raw4d = img_raw[np.newaxis,...,np.newaxis]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Zi3rfvEitmce","colab_type":"code","colab":{}},"source":["g = tf.Graph()\n","with g.as_default():\n"," #convolve x 5 times with a 5x5 filter\n"," x = tf.placeholder(dtype=tf.float32, shape=(1,256,256,1),name='img')\n"," flt = tf.placeholder(dtype=tf.float32, shape=(5,5,1,1), name='flt')\n"," y1 = tf.nn.conv2d(x , flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved') #[1,2,2,1]\n"," y2 = tf.nn.conv2d(y1, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n"," y3 = tf.nn.conv2d(y2, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n"," y4 = tf.nn.conv2d(y3, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n"," "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"DOEB24sLtmcg","colab_type":"code","colab":{}},"source":["flt_mtx = [\n"," [ 0, 0, 0, 0, 0,],\n"," [ 0, 0, 0, 0, 0,],\n"," [ 0, 0, 1, 0, 0,],\n"," [ 0, 0, 0, 0, 0,],\n"," [ 0, 0, 0, 0, 0,],\n","]\n","\n","with tf.Session(graph=g) as sess:\n"," flt_mtx_np = np.array(flt_mtx, np.float32)\n"," flt_mtx_np = flt_mtx_np[..., np.newaxis, np.newaxis]\n"," res = sess.run([x,y1,y2,y3,y4], feed_dict={x:img_raw4d, flt:flt_mtx_np})\n","res = [r[0,...,0] for r in res]\n","\n","\n","n = len(res)\n","fig, ax = plt.subplots(1, n+1, figsize=(n*4, 4))\n","for col in range(n):\n"," ax[col].imshow(res[col], cmap='gray')\n"," ax[col].grid(False)\n"," ax[col].set_title('conv %d'% col if col else 'raw')\n","\n","ax[n].imshow(flt_mtx, cmap='gray')\n","ax[n].grid(False)\n","_=ax[n].set_title('filter')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Van15HojduYJ","colab_type":"code","colab":{}},"source":["def gaussian(n=5):\n"," x = np.linspace(-3, 3, n)\n"," y = np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n"," return y\n","\n","def dgaussian(n=5):\n"," x = np.linspace(-3, 3, n)\n"," y = - 2 * x * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n"," return y\n","\n","def ddgaussian(n=5):\n"," x = np.linspace(-3, 3, n)\n"," y = - 2 * (2*x**2 - 1) * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n"," return y\n"," \n","def ddgaussian2d(n=5):\n"," c = np.linspace(-3, 3, n)\n"," r = np.asarray([[np.sqrt(xi**2+yi**2) for xi in c] for yi in c])\n"," f = lambda x: (- 2 * (2*x**2 - 1) * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi))\n","\n"," y = f(r)\n"," y -= y.mean()\n"," return y\n"," "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OljkMi3jgkh1","colab_type":"code","colab":{}},"source":["n = 30\n","gf = np.tile(gaussian(n)[np.newaxis], [n, 1])\n","\n","dgf = np.tile(dgaussian(n)[np.newaxis], [n, 1])\n","\n","ddgf = ddgaussian(n)\n","ddgf -= ddgf.mean()\n","ddgf = np.tile(ddgf[np.newaxis], [n, 1])\n","\n","ddgf2d = ddgaussian2d(n)\n","rf2d = lambda: np.random.normal(size=(5,5))\n","\n","\n","plt.plot(gf[0])\n","plt.plot(dgf[0])\n","plt.plot(ddgf[0])\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-wlTUAf60RXF","colab_type":"code","colab":{}},"source":["plt.imshow(dgf*gf.transpose())\n","plt.grid(False)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zljWXlXquHgp","colab_type":"text"},"source":["## 3. Homework"]},{"cell_type":"markdown","metadata":{"id":"bZBfo0m40vB6","colab_type":"text"},"source":["In last session we used fully connected network to clasify digits.\n","Try to build the convolutional network: use three convolutional layers, then flatten the ouput and apply 1 fully connected.\n","You can use the following helper function. Notice: there is a stride parameter. It allows to effectively downscale the feature maps.\n","To get an understanding of different convolution types, check the animations here."]},{"cell_type":"code","metadata":{"code_folding":[0],"id":"5BoKD4jltmci","colab_type":"code","colab":{}},"source":["def conv_2D(x, n_output_ch,\n"," k=3,\n"," s=1,\n"," activation=tf.nn.relu,\n"," padding='VALID', name='conv2d', reuse=None\n"," ):\n"," \"\"\"\n"," Helper for creating a 2d convolution operation.\n","\n"," Args:\n"," x (tf.Tensor): Input tensor to convolve.\n"," n_output_ch (int): Number of filters.\n"," k (int): Kernel width and height\n"," s (int): Stride in x and y\n"," activation (tf.Function): activation function to apply to the convolved data\n"," padding (str): Padding type: 'SAME' or 'VALID'\n"," name (str): Variable scope\n"," reuse (tf.Flag): Flag whether to use existing variable. Can be False(None), True, or tf.AUTO_REUSE\n","\n"," Returns:\n"," op (tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor): Output of activation, convolution, weights, bias\n"," \"\"\"\n"," with tf.variable_scope(name or 'conv2d', reuse=reuse):\n"," w = tf.get_variable(name='W',\n"," shape=[k, k, x.get_shape()[-1], n_output_ch],\n"," initializer=tf.initializers.he_uniform()\n"," )\n"," \n"," wx = tf.nn.conv2d(name='conv',\n"," input=x, filter=w,\n"," strides=[1, s, s, 1],\n"," padding=padding\n"," )\n"," \n"," b = tf.get_variable(name='b',\n"," shape=[n_output_ch], initializer=tf.initializers.constant(value=0.0)\n"," )\n"," h = tf.nn.bias_add(name='h',\n"," value=wx,\n"," bias=b\n"," )\n","\n"," if activation is not None:\n"," x = activation(h, name=activation.__name__)\n"," else:\n"," x = h\n"," \n"," return x, w"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"OM-OC3L4wyTn","colab_type":"text"},"source":["You can start with something like this:\n"]},{"cell_type":"code","metadata":{"id":"zl3n54VPw-C5","colab_type":"code","colab":{}},"source":["...\n","x_train = x_train_2d[..., np.newaxis] # we need additional channel dimension\n","\n","....\n","\n","X = tf.placeholder(name='X', dtype=tf.float32, shape=[None, n_input])\n","\n","L1, W1 = conv_2D(X, 16, name = 'C1')\n","L2, W2 = conv_2D(L1, 32, s=2, name = 'C2')\n","L3, W3 = conv_2D(L2, 32, s=2, name = 'C3')\n","\n","L3_f = tf.keras.layers.Flatten(L3)\n","\n","L4, W4 = fully_connected_layer(L3_f , 32, 'F1', activation=tf.nn.relu)\n","L5, W5 = fully_connected_layer(L4 , 10, 'F2')\n","\n","Y_onehot = tf.nn.softmax(L5, name='Logits')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"oEnrC5c0z-Ci","colab_type":"text"},"source":["Play with layer parameters. Can you get better performance than in fully connected network?"]},{"cell_type":"markdown","metadata":{"id":"UPrth-Ertmck","colab_type":"text"},"source":["## 4. Load the model"]},{"cell_type":"markdown","metadata":{"id":"U5v3hUe4tmck","colab_type":"text"},"source":["inception module here is a small module that performs loading the inception model as well as image preparation for the training."]},{"cell_type":"code","metadata":{"id":"q5uQgNCwtmcl","colab_type":"code","colab":{}},"source":["net, net_labels = inception.get_inception_model()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"pDfUcRvRtmcn","colab_type":"code","colab":{}},"source":["#get model graph definition and change it to use GPU\n","gd = net\n","\n","str_dg = gd.SerializeToString()\n","#uncomment next line to use GPU acceleration\n","#str_dg = str_dg.replace(b'/cpu:0', b'/gpu:0') #a bit extreme approach, but works =)\n","gd = gd.FromString(str_dg)\n","\n","gr_disp.show(gd)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"N_DKz0sKtmco","colab_type":"text"},"source":["## 5. Create the graph"]},{"cell_type":"markdown","metadata":{"id":"4klBSAR9tmcp","colab_type":"text"},"source":["This whole model won't fit in GPU memory. We will take only the part from input to the main output and copy it to a second graph, that we will use further."]},{"cell_type":"code","metadata":{"id":"W5O-bSOhtmcp","colab_type":"code","colab":{}},"source":["gd2 = tf.graph_util.extract_sub_graph(gd, ['output'])\n","g2 = tf.Graph() # full graph\n","with g2.as_default():\n"," tf.import_graph_def(gd2, name='inception')\n","\n","gr_disp.show(g2.as_graph_def())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"FPzKLLPitmcr","colab_type":"text"},"source":["## 6. Test the model"]},{"cell_type":"markdown","metadata":{"id":"QJIfPj7Wtmcs","colab_type":"text"},"source":["We will use one image to check model. `img_preproc` is croped to 256x256 pixels and slightly transformed to be used as imput for the model using `inception.prepare_training_img`. `inception.training_img_to_display` is then used to convert it to displayable one.\n"]},{"cell_type":"code","metadata":{"id":"k9HRNGmstmcs","colab_type":"code","colab":{}},"source":["img_raw = plt.imread('ML3/chelsea.jpg')\n","img_preproc = inception.prepare_training_img(img_raw)\n","img_deproc = inception.training_img_to_display(img_preproc)\n","_, axs = plt.subplots(1, 2, figsize=(10,5))\n","axs[0].imshow(img_raw)\n","axs[0].grid(False)\n","axs[1].imshow(img_deproc)\n","axs[1].grid(False)\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"C7fpmx0btmcu","colab_type":"text"},"source":["We then get the input and output tensors, and obtain probabilities of each class on this image:"]},{"cell_type":"code","metadata":{"id":"nLrIdxsAtmcu","colab_type":"code","colab":{}},"source":["# From graph we will get the input and output tensors. \n","# Any tensor and operation can be obtained by name\n","g2.device('/gpu:0')\n","with g2.as_default():\n"," x = g2.get_tensor_by_name('inception/input:0')\n"," softmax = g2.get_tensor_by_name('inception/output:0')\n"," \n","# Then we will feed the image in the graph and print 5 classes that have highest probability\n","with tf.Session(graph=g2) as sess:\n"," res = np.squeeze(sess.run(softmax, feed_dict={x: img_preproc[np.newaxis]}))\n"," \n"," indexes_sorted_by_probability = res.argsort()[::-1]\n"," print([(res[idx], net_labels[idx])\n"," for idx in indexes_sorted_by_probability[:5]])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"55kW3cf7trfs","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]}]}